Cross-Validation and Minimum Generation Error based Decision Tree Pruning for HMM-based Speech Synthesis

نویسندگان

  • Heng Lu
  • Zhen-Hua Ling
  • Li-Rong Dai
  • Ren-Hua Wang
چکیده

This paper presents a decision tree pruning method for the model clustering of HMM-based parametric speech synthesis by cross-validation (CV) under the minimum generation error (MGE) criterion. Decision-tree-based model clustering is an important component in the training process of an HMM based speech synthesis system. Conventionally, the maximum likelihood (ML) criterion is employed to choose the optimal contextual question from the question set for each tree node split and the minimum description length (MDL) principle is introduced as the stopping criterion to prevent building overly large tree models. Nevertheless, the MDL criterion is derived based on an asymptotic assumption and is problematic in theory when the size of the training data set is not large enough. Besides, inconsistency exists between the MDL criterion and the aim of speech synthesis. Therefore, a minimum cross generation error (MCGE) based decision tree pruning method for HMM-based speech synthesis is proposed in this paper. The initial decision tree is trained by MDL clustering with a factor estimated using the MCGE criterion by cross-validation. Then the decision tree size is tuned by backing-off or splitting each leaf node iteratively to minimize a cross generation error, which is defined to present the sum of generation errors calculated for all training sentences using cross-validation. Objective and subjective evaluation results show that the proposed method outperforms the conventional MDL-based model clustering method significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum generation error criterion for tree-based clustering of context dependent HMMs

Due to the inconsistency between HMM training and synthesis application in HMM-based speech synthesis, the minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion for tree-based clustering of context dependent HMMs. As directly applying the MGE criterion results in an unacceptable computational cost, the parameter updating rul...

متن کامل

Autoregressive clustering for HMM speech synthesis

The autoregressive HMM has been shown to provide efficient parameter estimation and high-quality synthesis, but in previous experiments decision trees derived from a non-autoregressive system were used. In this paper we investigate the use of autoregressive clustering for autoregressive HMM-based speech synthesis. We describe decision tree clustering for the autoregressive HMM and highlight dif...

متن کامل

Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis

A minimum generation error (MGE) criterion had been proposed to solve the issues related to maximum likelihood (ML) based HMM training in HMM-based speech synthesis. In this paper, we improve the MGE criterion by imposing a log spectral distortion (LSD) instead of the Euclidean distance to define the generation error between the original and generated line spectral pair (LSP) coefficients. More...

متن کامل

Overview of NIT HMM - based speech synthesis system for Blizzard Challenge 2010

This paper describes a hidden Markov model (HMM)-based speech synthesis system developed for the Blizzard Challenge 2010. This system employs STRAIGHT vocoding, minimum generation error (MGE) training, minimum generation error linear regression (MGELR) based model adaptation, the Bayesian speech synthesis framework, and the parameter generation algorithm considering global variance. The real-ti...

متن کامل

Generating intonation from a mixed CART-HMM model for speech synthesis

This paper proposes two algorithms for generating intonation from a mixed CART-HMM intonation model for speech synthesis. Based either on a Viterbi search or on the Expectation-Maximization algorithm, the two generation algorithms are analyzed in terms of likelihood and F0 Root Mean Square Error. Listening tests are performed to subjectively evaluate the quality of the generated intonation.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2010